130. Edouard Harris - New Research: Advanced AI may tend to seek power by default

Update: 2022-10-12

Description

Progress in AI has been accelerating dramatically in recent years, and even months. It seems like every other day, there’s a new, previously-believed-to-be-impossible feat of AI that’s achieved by a world-leading lab. And increasingly, these breakthroughs have been driven by the same, simple idea: AI scaling.

For those who haven’t been following the AI scaling sage, scaling means training AI systems with larger models, using increasingly absurd quantities of data and processing power. So far, empirical studies by the world’s top AI labs seem to suggest that scaling is an open-ended process that can lead to more and more capable and intelligent systems, with no clear limit.

And that’s led many people to speculate that scaling might usher in a new era of broadly human-level or even superhuman AI — the holy grail AI researchers have been after for decades.

And while that might sound cool, an AI that can solve general reasoning problems as well as or better than a human might actually be an intrinsically dangerous thing to build.

At least, that’s the conclusion that many AI safety researchers have come to following the publication of a new line of research that explores how modern AI systems tend to solve problems, and whether we should expect more advanced versions of them to perform dangerous behaviours like seeking power.

This line of research in AI safety is called “power-seeking”, and although it’s currently not well understood outside the frontier of AI safety and AI alignment research, it’s starting to draw a lot of attention. The first major theoretical study of power seeking was led by Alex Turner, who’s appeared on the podcast before, and was published in NeurIPS (the world’s top AI conference), for example.

And today, we’ll be hearing from Edouard Harris, an AI alignment researcher and one of my co-founders in the AI safety company (Gladstone AI). Ed’s just completed a significant piece of AI safety research that extends Alex Turner’s original power-seeking work, and that shows what seems to be the first experimental evidence suggesting that we should expect highly advanced AI systems to seek power by default.

What does power seeking really mean though? And does all this imply for the safety of future, general-purpose reasoning systems? That’s what this episode will be all about.

***

Intro music:

- Artist: Ron Gelinas

- Track Title: Daybreak Chill Blend (original mix)

- Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

- 0:00 Intro

- 4:00 Alex Turner's research

- 7:45 What technology wants

- 11:30 Universal goals

- 17:30 Connecting observations

- 24:00 Micro power seeking behaviour

- 28:15 Ed's research

- 38:00 The human as the environment

- 42:30 What leads to power seeking

- 48:00 Competition as a default outcome

- 52:45 General concern

- 57:30 Wrap-up

Comments

In Channel

130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default*

2022-10-1258:22

129. Amber Teng - Building apps with a new generation of language models

2022-10-0551:21

128. David Hirko - AI observability and data as a cybersecurity weakness

2022-09-2849:02

127. Matthew Stewart - The emerging world of ML sensors

2022-09-2141:34

126. JR King - Does the brain run on deep learning?

2022-09-1455:43

125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety?

2022-09-0748:19

124. Alex Watson - Synthetic data could change everything

2022-05-1851:47

123. Ala Shaabana and Jacob Steeves - AI on the blockchain (it actually might just make sense)

2022-05-1254:43

122. Sadie St. Lawrence - Trends in data science

2022-05-0443:02

121. Alexei Baevski - data2vec and the future of multimodal learning

2022-04-2749:31

120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models

2022-04-2040:47

119. Jaime Sevilla - Projecting AI progress from compute trends

2022-04-1348:34

118. Angela Fan - Generating Wikipedia articles with AI

2022-04-0651:44

117. Beena Ammanath - Defining trustworthy AI

2022-03-3046:46

116. Katya Sedova - AI-powered disinformation, present and future

2022-03-2354:24

115. Irina Rish - Out-of-distribution generalization

2022-03-0950:12

114. Sam Bowman - Are we *under-hyping* AI?

2022-03-0247:48

113. Yaron Singer - Catching edge cases in AI

2022-02-0935:20

112. Tali Raveh - AI, single cell genomics, and the new era of computational biology

2022-02-0242:04

111. Mo Gawdat - Scary Smart: A former Google exec’s perspective on AI risk

2022-01-2601:00:12

00:00

130. Edouard Harris - New Research: Advanced AI may tend to seek power by default

#box-pro-ellipsis-175955047431869{-webkit-line-clamp:2;}130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default*

130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default*

The TDS team

130. Edouard Harris - New Research: Advanced AI may tend to seek power by default

130. Edouard Harris - New Research: Advanced AI may tend to seek power by default